8 research outputs found
Evolutionary discriminative confidence estimation for spoken term detection
The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-011-0913-zSpoken term detection (STD) is the task of searching for occurrences
of spoken terms in audio archives. It relies on robust confidence estimation
to make a hit/false alarm (FA) decision. In order to optimize the decision
in terms of the STD evaluation metric, the confidence has to be discriminative.
Multi-layer perceptrons (MLPs) and support vector machines (SVMs) exhibit
good performance in producing discriminative confidence; however they are
severely limited by the continuous objective functions, and are therefore less
capable of dealing with complex decision tasks. This leads to a substantial
performance reduction when measuring detection of out-of-vocabulary (OOV)
terms, where the high diversity in term properties usually leads to a complicated
decision boundary.
In this paper we present a new discriminative confidence estimation approach
based on evolutionary discriminant analysis (EDA). Unlike MLPs and
SVMs, EDA uses the classification error as its objective function, resulting
in a model optimized towards the evaluation metric. In addition, EDA combines
heterogeneous projection functions and classification strategies in decision
making, leading to a highly flexible classifier that is capable of dealing
with complex decision tasks. Finally, the evolutionary strategy of EDA reduces the risk of local minima. We tested the EDA-based confidence with a
state-of-the-art phoneme-based STD system on an English meeting domain
corpus, which employs a phoneme speech recognition system to produce lattices
within which the phoneme sequences corresponding to the enquiry terms
are searched. The test corpora comprise 11 hours of speech data recorded with
individual head-mounted microphones from 30 meetings carried out at several
institutes including ICSI; NIST; ISL; LDC; the Virginia Polytechnic Institute
and State University; and the University of Edinburgh. The experimental results
demonstrate that EDA considerably outperforms MLPs and SVMs on
both classification and confidence measurement in STD, and the advantage
is found to be more significant on OOV terms than on in-vocabulary (INV)
terms. In terms of classification performance, EDA achieved an equal error
rate (EER) of 11% on OOV terms, compared to 34% and 31% with MLPs and
SVMs respectively; for INV terms, an EER of 15% was obtained with EDA
compared to 17% obtained with MLPs and SVMs. In terms of STD performance
for OOV terms, EDA presented a significant relative improvement of
1.4% and 2.5% in terms of average term-weighted value (ATWV) over MLPs
and SVMs respectively.This work was partially supported by the French Ministry of Industry
(Innovative Web call) under contract 09.2.93.0966, âCollaborative Annotation for Video
Accessibilityâ (ACAV) and by âThe Adaptable Ambient Living Assistantâ (ALIAS) project
funded through the joint national Ambient Assisted Living (AAL) programme
Adaptive framing based similarity measurement between time warped speech signals using Kalman filter
Similarity measurement between speech signals aims at calculating the degree of similarity using acoustic features that has been receiving much interest due to the processing of large volume of multimedia information. However, dynamic properties of speech signals such as varying silence segments and time warping factor make it more challenging to measure the similarity between speech signals. This manuscript entails further extension of our research towards the adaptive framing based similarity measurement between speech signals using a Kalman filter. Silence removal is enhanced by integrating multiple features for voiced and unvoiced speech segments detection. The adaptive frame size measurement is improved by using the acceleration/deceleration phenomenon of object linear motion. A dominate feature set is used to represent the speech signals along with the pre-calculated model parameters that are set by the offline tuning of a Kalman filter. Performance is evaluated using additional datasets to evaluate the impact of the proposed model and silence removal approach on the time warped speech similarity measurement. Detailed statistical results are achieved indicating the overall accuracy improvement from 91 to 98% that proves the superiority of the extended approach on our previous research work towards the time warped continuous speech similarity measurement
Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion
The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2
(TEC2012-37585-C02-01) from the Spanish Ministry of Economy and
Competitiveness. This research was also funded by the European Regional
Development Fund, the Galician Regional Government (GRC2014/024,
âConsolidation of Research Units: AtlantTIC Projectâ CN2012/160)